Trainable Coarse Bilingual Grammars for Parallel Text Bracketing

نویسنده

  • Dekai Wu
چکیده

We describe two new strategies to automatic bracketing of parallel corpora, with particular application to languages where prior grammar resources are scarce: (1) coarse bilingual grammars, and (2) unsupervised training of such grammars via EM (expectation-maximization). Both methods build upon a formalism we recently introduced called stochastic inversion transduction grammars. The first approach borrows a coarse monolingual grammar into our bilingual formalism, in order to transfer knowledge of one language's constraints to the task of bracketing the texts in both languages. The second approach generalizes the inside-outside algorithm to adjust the grammar parameters so as to improve the likelihood of a training corpus. Preliminary experiments on parallel English-Chinese text are supportive of these strategies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words

We describe a grammarless method for simultaneously bracketing both halves of a parallel text and giving word alignments, assuming only a translation lexicon for the language pair. We introduce inversion-invariant transduction grammars which serve as generative models for parallel bilingual sentences with weak order constraints. Focusing on transduction grammars for bracketing, we formulate a n...

متن کامل

An Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words

We describe a grammarless method for simultaneously bracketing both halves of a parallel text and giving word alignments, assuming only a translation lexicon for the language pair. We introduce inversion-invariant transduction grammars which serve as generative models for parallel bilingual sentences with weak order constraints. Focusing on Wansduction grammars for bracketing, we formulate a no...

متن کامل

Stochastic Inversion Transduction Grammars, with Application to Segmentation, Bracketing, and Alignment of Parallel Corpora

We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with potential application to a variety of parallel corpus analysis problems. The formalism combines three tactics against the constraints that render finite-state transducers less useful: it skips directly to a context-free rat...

متن کامل

Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora

We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major features distinguish the formalism from the finite-state transducers more traditionally found in computational linguistic...

متن کامل

Recognizing Paraphrases And Textual Entailment Using Inversion Transduction Grammars

We present first results using paraphrase as well as textual entailment data to test the language universal constraint posited by Wu’s (1995, 1997) Inversion Transduction Grammar (ITG) hypothesis. In machine translation and alignment, the ITG Hypothesis provides a strong inductive bias, and has been shown empirically across numerous language pairs and corpora to yield both efficiency and accura...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995